Not all links are equal: Exploiting Dependency Types for the Extraction of Protein-Protein Interactions from Text

نویسندگان

  • Philippe E. Thomas
  • Stefan Pietschmann
  • Illés Solt
  • Domonkos Tikk
  • Ulf Leser
چکیده

The extraction of protein-protein interactions (PPIs) reported in scientific publications is one of the most studied topics in Text Mining in the Life Sciences, as such algorithms can substantially decrease the effort for databases curators. The currently best methods for this task are based on analyzing the dependency tree (DT) representation of sentences. Many approaches exploit only topological features and thus do not yet fully exploit the information contained in DTs. We show that incorporating the grammatical information encoded in the types of the dependencies in DTs noticeably improves extraction performance by using a pattern matching approach. We automatically infer a large set of linguistic patterns using only information about interacting proteins. Patterns are then refined based on shallow linguistic features and the semantics of dependency types. Together, these lead to a total improvement of 17.2 percent points in F1, as evaluated on five publicly available PPI corpora. More than half of that improvement is gained by properly handling dependency types. Our method provides a general framework for building task-specific relationship extraction methods that do not require annotated training data. Furthermore, our observations offer methods to improve upon relation extraction approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Grammatical Relations for Protein Relation Extraction and Role Labeling

Automatic protein interaction mining from natural language texts and automatic identification of the agent and target proteins (i.e. role labeling) are challenging problems that attract a lot of attention because of the growing amount of biomedical text resources. We propose a novel approach that relies exclusively on parsing and dependency information. We strategically omit any context informa...

متن کامل

c○2011 The Association for Computational Linguistics Order copies of this and other ACL proceedings from:

The extraction of protein-protein interactions (PPIs) reported in scientific publications is one of the most studied topics in Text Mining in the Life Sciences, as such algorithms can substantially decrease the effort for databases curators. The currently best methods for this task are based on analyzing the dependency tree (DT) representation of sentences. Many approaches exploit only topologi...

متن کامل

A New Thermodynamic Approach for Protein Partitioning in Reverse Micellar Solution

Reverse micellar systems are nanofluids with unique properties that make them attractive in high selectivity separation processes, especially for biological compounds. Understanding the phase behavior and thermodynamic properties of these nanosystems is the first step in process design. Separation of components by these nanosystems is performed upon contact of aqueous and reverse micellar phase...

متن کامل

Discovering Domains Mediating Protein Interactions

Background: Protein-protein interactions do not provide any direct information re‌garding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting do‌main pairs. However they do not consider the in...

متن کامل

Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks

Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011